Skip to content

PLUGIN-1823: Retrying all SQLTransientExceptions #597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: develop
Choose a base branch
from

Conversation

sgarg-CS
Copy link
Contributor

@sgarg-CS sgarg-CS commented May 16, 2025

PLUGIN-1823

Add Failsafe Retry poilcy to all the places in the database-plugins where SQLTransientException could be thrown.

Added three new properties (hidden from UI)

  • Initial Retry Duration (Default: 5sec)
  • Max Retry Duration (Default: 80 sec)
  • Max Retry Count (Default: 5)

Copy link

google-cla bot commented May 16, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@sgarg-CS sgarg-CS force-pushed the patch/plugin-1823 branch from 9da75b8 to ac813f0 Compare May 26, 2025 05:28
@sgarg-CS sgarg-CS added build and removed build labels May 27, 2025
@sgarg-CS sgarg-CS requested a review from itsankit-google May 30, 2025 04:56
Copy link
Member

@itsankit-google itsankit-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks everything is getting wrapped within Failsafe where we might end up with having nested level retries, we need to ensure we add retries only where we are actually interacting with JDBC client and not top level functions.

For example adding retries to DriverManager.getConnection(connectionString, connectionProperties) makes sense because you are actually interacting with the source db but adding retries to whole loadSchema(Connection connection, String query) do not makes sense we need to be careful while adding such retries.

@itsankit-google
Copy link
Member

Please note E2E should not be modified and not fail with these changes. Otherwise, we have done something wrong which does not give expected failure messages.

Copy link
Member

@itsankit-google itsankit-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see some level of duplication in both AbstractDBSource & AbstractDBSink, can we please move it to the common AbstractDBUtil class?

Copy link
Member

@itsankit-google itsankit-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public final class RetryUtils {

  public static Connection createConnectionWithRetry(RetryPolicy<?> retryPolicy, String connectionString,
                                                     Properties connectionProperties, String externalDocumentationLink) throws Exception {
    try {
      return Failsafe.with(retryPolicy).get(() ->
        DriverManager.getConnection(connectionString, connectionProperties)
      );
    } catch (Exception e) {
      throw unwrapFailsafeException(e, externalDocumentationLink);
    }
  }

  public static Statement createStatementWithRetry(RetryPolicy<?> retryPolicy,
                                                   Connection connection, String externalDocumentationLink) throws Exception {
    try {
      return Failsafe.with(retryPolicy).get(connection::createStatement);
    } catch (Exception e) {
      throw unwrapFailsafeException(e, externalDocumentationLink);
    }
  }

  public static PreparedStatement prepareStatementWithRetry(RetryPolicy<?> retryPolicy,
                                                            Connection connection,
                                                            String sqlQuery, String externalDocumentationLink) throws Exception {
    try {
      return Failsafe.with(retryPolicy).get(() ->
        connection.prepareStatement(sqlQuery)
      );
    } catch (Exception e) {
      throw unwrapFailsafeException(e, externalDocumentationLink);
    }
  }

 public static ResultSet executeWithRetry(RetryPolicy<?> retryPolicy,
                                                            Connection connection,
                                                            String sqlQuery, String externalDocumentationLink) throws Exception {
        try {
            return Failsafe.with(retryPolicy).get(() -> connection.createStatement().executeQuery(sqlQuery));
        } catch (Exception e) {
            throw unwrapFailsafeException(e, externalDocumentationLink);
        }
    }

 private static Exception unwrapFailsafeException(Exception e) {
    if (e instanceof FailsafeException && e.getCause() instanceof Exception) {
        if (e instanceOf SQLException) {
           return programFailureException(e, externalDocumentationLink);
        } else {
          return (Exception) e.getCause();
       }
    }
    return e;
  }
  
private static ProgramFailureException programFailureException(SQLException e, String externalDocumentationLink) {
    // wrap exception to ensure SQLException-child instances not exposed to contexts without jdbc
    // driver in classpath
    String errorMessage =
      String.format("SQL Exception occurred: [Message='%s', SQLState='%s', ErrorCode='%s'].",
        e.getMessage(), e.getSQLState(), e.getErrorCode());
    String errorMessageWithDetails = String.format("Error occurred while trying to" +
      " get schema from database." + "Error message: '%s'. Error code: '%s'. SQLState: '%s'", e.getMessage(),
        e.getErrorCode(), e.getSQLState());
 
    if (!Strings.isNullOrEmpty(externalDocumentationLink)) {
      if (!errorMessage.endsWith(".")) {
        errorMessage = errorMessage + ".";
      }
      errorMessage = String.format("%s For more details, see %s", errorMessage, externalDocumentationLink);
    }
    return ErrorUtils.getProgramFailureException(new ErrorCategory(ErrorCategory.ErrorCategoryEnum.PLUGIN),
      errorMessage, errorMessageWithDetails, ErrorType.USER, false, ErrorCodeType.SQLSTATE, e.getSQLState(),
        externalDocumentationLink, e);
  }
}

You can create a RetryUtils like above which accepts connection params.

Move retry logic into a separate class: RetryUtils and add exception handling
@sgarg-CS
Copy link
Contributor Author

sgarg-CS commented Jun 3, 2025

Overall, this looks everything is getting wrapped within Failsafe where we might end up with having nested level retries, we need to ensure we add retries only where we are actually interacting with JDBC client and not top level functions.

For example adding retries to DriverManager.getConnection(connectionString, connectionProperties) makes sense because you are actually interacting with the source db but adding retries to whole loadSchema(Connection connection, String query) do not makes sense we need to be careful while adding such retries.

Refactored the code to add the retry logic only for the methods interacting with the JDBC client.

…rn types for overridden methods to base class
Comment on lines 385 to 386
protected String getExternalDocumentationLink() {
return null;
return "https://en.wikipedia.org/wiki/SQLSTATE";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this method still needed? can we remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed now. Removed. 7c8815d

Comment on lines 165 to 167
protected String getExternalDocumentationLink() {
return "https://en.wikipedia.org/wiki/SQLSTATE";
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this method still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not needed now. Removed it. 7c8815d

@@ -87,13 +86,17 @@ public abstract class AbstractDBSource<T extends PluginConfig & DatabaseSourceCo
Pattern.CASE_INSENSITIVE);
private static final Pattern WHERE_CONDITIONS = Pattern.compile("\\s+where \\$conditions",
Pattern.CASE_INSENSITIVE);
private final RetryPolicy<?> retryPolicy;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove empty line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed 0b57ce6

@itsmekumari itsmekumari force-pushed the patch/plugin-1823 branch 2 times, most recently from 71e3ddb to c6de31b Compare July 4, 2025 06:38
@@ -1,9 +1,9 @@
errorMessageInvalidSourceDatabase=SQL error while getting query schema: Error: Unknown database 'invalidDatabase', SQLState: 42000, ErrorCode: 1049
errorMessageInvalidSourceDatabase=errorMessage: SQL Exception occurred
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see generic error messages which is bad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The earlier error message is changed as per the new code, and hence as per the latest code updated the error message not include the dynamic/privacy data present in error msgs here it contains.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested, updated the error message.

Copy link
Member

@itsankit-google itsankit-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please add Open and Capture Logs step to these scenarios :

} else if (cause instanceof RuntimeException) {
return (RuntimeException) cause;
} else if (cause instanceof Error) {
return new RuntimeException("Failsafe wrapped an Error", cause);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return new RuntimeException("Operation failed with error", cause);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 7977d45

} else if (cause instanceof Error) {
return new RuntimeException("Failsafe wrapped an Error", cause);
} else {
return new RuntimeException("Failsafe wrapped a non-runtime exception", cause);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return new RuntimeException("Operation failed", cause);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 7977d45

@itsmekumari
Copy link
Contributor

can we please add Open and Capture Logs step to these scenarios :

Can we address these changes in other PR? We will create a new ticket to address these changes. Please confirm.

@itsankit-google
Copy link
Member

Can we address these changes in other PR? We will create a new ticket to address these changes. Please confirm.

I am not sure how to verify the e2e changes then?

@itsmekumari
Copy link
Contributor

Can we address these changes in other PR? We will create a new ticket to address these changes. Please confirm.

I am not sure how to verify the e2e changes then?

Changes done.

@sgarg-CS sgarg-CS force-pushed the patch/plugin-1823 branch from 7977d45 to 40ff60a Compare July 14, 2025 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants